Minoan linguistic resources: The Linear A Digital Corpus

نویسندگان

  • Tommaso Petrolito
  • Ruggero Petrolito
  • Francesco Perono Cacciafoco
  • Grégoire Winterstein
چکیده

This paper describes the Linear A/Minoan digital corpus and the approaches we applied to develop it. We aim to set up a suitable study resource for Linear A and Minoan. Firstly we start by introducing Linear A and Minoan in order to make it clear why we should develop a digital marked up corpus of the existing Linear A transcriptions. Secondly we list and describe some of the existing resources about Linear A: Linear A documents (seals, statuettes, vessels etc.), the traditional encoding systems (standard code numbers referring to distinct symbols), a Linear A font, and the newest (released on June 16th 2014) Unicode Standard Characters set for Linear A. Thirdly we explain our choice concerning the data format: why we decided to digitize the Linear A resources; why we decided to convert all the transcriptions in standard Unicode characters; why we decided to use an XML format; why we decided to implement the TEI-EpiDoc DTD. Lastly we describe: the developing process (from the data collection to the issues we faced and the solving strategies); a new font we developed (synchronized with the Unicode Characters Set) in order to make the data readable even on systems that are not updated. Finally, we discuss the corpus we developed in a Cultural Heritage preservation perspective and suggest some future works.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Contrastive Study of Metadiscourse in English and Persian Editorials

The original impetus for this cross-linguistic study came from a need to explore the effect of cultural factors and generic conventions on the use and distribution of metadiscourse within a single genre. To this end, the study as a contrastive rhetoric research, examined a corpus of 60 newspaper editorials (written in English and Persian) culled from 10 elite newspapers in America and Iran. Bas...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Developments of Swahili resources for an automatic speech recognition system

This article describes our efforts to provide ASR resources for Swahili, a Bantu language spoken in a wide area of East Africa. We start with an introduction on the language situation, both at linguistic and digital level. Then, we report the selected strategies to develop a text corpus, a pronunciation dictionary and a speech corpus for this under-resourced language. We explore methodologies a...

متن کامل

Global Open Resources and Information for Language and Linguistic Analysis (GORILLA)

The infrastructure Global Open Resources and Information for Language and Linguistic Analysis (GORILLA) was created as a resource that provides a bridge between disciplines such as documentary, theoretical, and corpus linguistics, speech and language technologies, and digital language archiving services. GORILLA is designed as an interface between digital language archive services and language ...

متن کامل

Automatic Extraction of Linguistic Data from Digitized Documents

This paper presents a system for automatically extracting linguistic data from digitized linguistic documents using a combination of existing software packages and custom scripts. The system is designed to leverage existing resources in online digital libraries in order to bootstrap the creation of large, multi-lingual linguistic corpora, which can then be used to conduct data-driven experiment...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015